Introduction

Tax Increment Financing

Tax Increment Financing districts are a funding tool to “build and repair roads and infrastructure, clean polluted land and put vacant properties back to productive use, usually in conjunction with private development projects”.

Funds are generated by the increase in property value properties in the designated areas. The property taxes generated off of any growth after the area designation are then used t fund projects. In other words, it comes only from the difference in property value after designation.

The areas themselves must be areas characterized by blight which means areas that potentially have excessive vacancies, lack of physical maintenance, lack of community planning, dilapidation, etc. That is how they supposed to work.

Data Variable and Variable Characteristics

The census data used includes income data, mean commute time in minutes and property value. All census data is aggregated by tract, which was the basis of all joins used for the final processed input data. CMAP data used includes metrics on transit Accessibility, household density and employment accessibility. Finally, from the Chicago Data Portal, the boundaries for TIFs and TIF expenditures were obtained. All of this data was combined using spatial joins and joins by census tract ID wherever possible.

When we look at the transit accessibility versus property value, we find a light correlation between indicating that those areas with higher property values tend to have more access to transit means, and to some degree, more employment.

##                   household_incomeE emp_mile_med prop_value_medE HH_per_acr_med
## household_incomeE       1.000000000   0.60083502      0.56145497     0.41874882
## emp_mile_med            0.600835020   1.00000000      0.28332164     0.32925726
## prop_value_medE         0.561454968   0.28332164      1.00000000     0.45070259
## HH_per_acr_med          0.418748819   0.32925726      0.45070259     1.00000000
## walk_score_med          0.286211447   0.28588779      0.49809493     0.59034926
## trans_avail_med         0.001746722   0.13244741      0.18860144     0.42747352
## tif_expend_med         -0.024268324   0.03423361     -0.04756442     0.01228008
##                   walk_score_med trans_avail_med tif_expend_med
## household_incomeE     0.28621145     0.001746722    -0.02426832
## emp_mile_med          0.28588779     0.132447413     0.03423361
## prop_value_medE       0.49809493     0.188601442    -0.04756442
## HH_per_acr_med        0.59034926     0.427473520     0.01228008
## walk_score_med        1.00000000     0.703326418     0.08768285
## trans_avail_med       0.70332642     1.000000000     0.11132661
## tif_expend_med        0.08768285     0.111326613     1.00000000

We see some similar correlations with property value, walkability and TIF expenditure.

Clustering for Abstract Socioeconomic Proxies

In this section we attempt to use a couple different clustering methods to try to differentiate different tiers of socioeconomic quality of life.

K-Means Clustering

While it appears that the more clusters we have, the better the results, this does not necessarily make sense for what we are trying to show. For example, how does one look at 10 tiers of development versus 3 or less? Additionally, the actual optimal number according to the “tightness” of the clusters (or how close individual nodes are to each other) is 2.

means_2 hh_income prop_value transit_avail walkability km_label_1
1 82911600 194200 4.75 75.5 Low Development
2 198443300 418300 5.00 117.5 High Development
means_3 hh_income prop_value transit_avail walkability km_label_2
1 75956100 185600 4.75 71.5 Low Development
3 154461500 357700 5.00 107.0 Medium Development
2 617522900 410400 5.00 132.5 High Development

Kernal K-Means Clustering

## # A tibble: 3 × 5
##   kkm_label          hh_income prop_value transit_avail walkability
##   <chr>                  <dbl>      <dbl>         <dbl>       <dbl>
## 1 Low Development     61574000     179900          4.88        72.5
## 2 Medium Development 149208000     250300          4.5         82.5
## 3 High Development   177349000     428600          5          119.

DBSCAN

## # A tibble: 4 × 5
##   db_label           hh_income prop_value transit_avail walkability
##   <chr>                  <dbl>      <dbl>         <dbl>       <dbl>
## 1 Low Development     93886600     220000          5           82.8
## 2 Medium Development 182322500     159250          4.89        80  
## 3 High Development   301013800     393900          4.12        75  
## 4 Excluded           397491400     413050          5          122

GIS Visualizations

These are visualizations created for a presentation on the methods and results of using clustering for geographic policy analysis. They are based on older ACS data and therefore may not represent the data shown above in the other sections of this notebook. See “Final Presentation.pdf” for the presentation file.

Additional Statistical Tests

In the map visualizations we see that there is a mixture of socioeconomic partitions throughout the city. What is the proportion of city that is highly developed?

## 
##   High Development    Low Development Medium Development 
##                 29                923                440
## 
##  1-sample proportions test with continuity correction
## 
## data:  29 out of length(chicago_transit_acs_tif_aug$km_label_2), null probability 1/3
## X-squared = 610.31, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is less than 0.3333333
## 95 percent confidence interval:
##  0.00000000 0.02853245
## sample estimates:
##          p 
## 0.02083333

The p-value for the test is well below a significance level of 0.10. Therefore the null hypothesis is rejected and we can say that less than a third of the city is highly developed.

Conclusion

Although there aren’t any easy metrics to compare across models, the usual k-means clustering appears to work best for a couple reasons. Firstly, it produces, visually anyway, the best separation between clusters. Secondly, it provides better coverage of the points whereas density based methods often exclude large chunks of data. Thirdly, in comparing the ordering the statistics of the subgroups, i.e. the socioeconomic measures, k-means and kernel k-means had orderings that made the most sense. DBSCAN sometimes had ordering such that when the socioeconomic class increased in tier, the property value might decrease, counter intuitively. Regardless, all of the models produced somewhat sensible results in terms of the aggregate statistics of the groups associated with each label in each model.

TIF district based economic policy tends to have a lot of supply side economics to it. Therefore no matter how one tries to optimize the immediate benefits of the policy will be on those who have more direct ability to participate in development and the decisions thereof, in other words, land owners, shareholders, corporate managers, etc. On the technical side of things, there are plenty of ways to change the hypothesis space or the model that determines the economic zones. The choice of variables can be chosen to reflect more of industrial supply side of economic, like the value of exports from an area, as opposed to more consumer or residential related statistics.